My (SeptemberM) Initial Impressions

I think there is some confusion here, not because of any individual, but because it appears to me that a step or two has been missed in the process of defining (and ultimately developing) a better GEDCOM. I have great respect for all the work that has been done and the progress made, and in the interest of helping that progress continue, I offer the following perspective on the project as it stands today, 16 Feb 2011. All thoughts, reactions, or questions are welcomed.

The Missing Steps

1. There hasn’t yet been a true definition of what a GEDCOM is which is a machine-readable file intended to transport genealogical information between applications and which can be archived with the intention of being readable and understandable by applications in the future (at least that’s my attempt at such a definition).

2. The question of which comes first, the chicken or the egg? has to be agreed upon. There has always been a great debate within the computer and software development community about which comes first – the data requirements or the language, application, platform decision. My experience has been that choosing the language/application/platform first is common in the business community in the interest of leveraging previous investments, and is what most computer professionals have experienced. However this choice causes the designers to automatically impose the limitations inherent in that choice onto the data requirements part. There is an automatic filtering of those requirements through the chosen platform to figure out how (or even whether) the data will fit into the platform. The results, of course, reflect the limitations inherent in the initial choices. The reality is that often the resulting applications are not meant to be viable for more than a few years before changes in hardware, software and business needs require new applications to be developed, so it has historically “worked.” When the luxury of choosing the other approach is available, as it is to this group, and the gathering and examining of all the data requirements is done first, and secondly a language/application/platform can be choosen that will accomodate as many of those requirements is possible, then the result is much more successful and has the potential for a much greater longevity. I realize this may cause a re-evaluation of some of the goals, but it is my observations that the difficulties agreeing on these goals is due to the prematurity of the decisions relating to these goals.

3. The task of defining a better GEDCOM has not yet been properly scoped. Looking at the data requirements of the genealogical community is a daunting task, particularly when we consider the range of needs and knowledge of that community –

potential users range from beginning hobbyists to experienced professionals to application developers;
the data ranges from basic information to massive genealogies with multiple file formats and thousands of individuals (and the corresponding documentation);
the research process, data collection, analysis and ranking (i.e. quality, usefulness, tracking) needs to be recorded and allow for the promotion of any part of this data from research to “documentation” which can be connected to one or more individuals;
the data may be maintained as private, on public websites, and in collaborative environments, or any combination of these.

BetterGEDCOM Visualization -- What is Required and Possible?

It is always possible that a better GEDCOM can’t be all things to all people, but, being an eternal optimist, I think that it can. What is necessary is for us to have a better visualization of these data requirements, of the genealogical view of the world. The best analogy I can think of right now is one most people are familiar with, which is is DaVince’s representation of man with his hands and feet touching a circle surrounding him. Now imagine this image as a 3D representation where man is standing within a sphere and that the points where the fingers and toes meet the sphere are points of connection, i.e. relationships, with the world around him. This is a good way to look at any individual within a family tree and reminds us that beyond the bare statistics of personal information, i.e. birth, marriage, death, and the point-in-time snapshots of census counts, etc., each individual, at any time in history, is at the center of a network of connections – to family, to friends, co-workers, acquaintances, church, government, societies and organizations, and so on. It is the genealogist’s challenge to discover and record both the bare statistics and these networked connections and relationships. In fact, most genealogists eventually find themselves having to use these connections and relationships in order to discover (or infer) the bare statistics/personal information for an individual. Thus they are interdependent components of the data requirements. Add to this the complications engendered by the fact that the nature of these connections change and develop over time. And also consider the need to accomodate the research being done to discover these connections, and the requirement that both the personal information and the connections also need to carry (or contain) documentation, e.g. connections to the research.

Conclusion

Through this representation it should become apparent that the larger part of the challenge of communicating genealogical information is in representing these relationships, these networks of connection between individuals, not only in a 3D, multiple instance way, but also with the 4D aspect of time. When we see it in this way, the difficulty of representing this information in a 2D flat file (of any format) becomes obvious. In order for a specification of genealogical data to be better than what currently exists, it has to allow for all of this information in order for genealogists to truly be able to “tell their stories,” through whatever platform or application vehicle they choose, now or in the future. This is the challenge, and it is one I'm sure we can meet together.